Warning Concerning Copyright Restrictions 

The Copyright law of the United States (Title 17, United States Code) governs the 
making of photocopies or other reproductions of copyright material. Under certain 
conditions specified in the law, libraries and archives are authorized to furnish a 
photocopy or other reproduction. One of these specified conditions is that the photocopy 
or reproduction not be "used for any purposes other than private study, scholarship, or 
research." If a user makes a request for, or later uses, a photocopy or reproduction for 
purposes in excess of "fair use," that user may be liable for copyright infringement. 



Instrumental Conditioning 


Overview 

This chapter discusses research and theory on instrumental conditioning. In 
instrumental conditioning, an organism is reinforced if it makes a response (R) 
in a certain stimulus (S) situation. For instance, Thorndike's cats were reinforced 
by escape and food if they hit the correct knob in his puzzle box. Just as classi¬ 
cal conditioning is associated with Pavlov, instrumental conditioning is some¬ 
times associated with Thorndike, but the association is not as strong because its 
use and study did not really originate -with Thorndike. In contrast with classical 
conditioning, the discovery of which came as something of a surprise, instru¬ 
mental conditioning is what everybody means by learning. It has been used by 
teachers and parents since time immemorial, and there has never been any lack 
of speculation as to how it should be used. Thorndike was simply the first to 
propose a scientific theory of its operation. 

Most of what happens in a classroom can be thought of as instrumental 
conditioning. Consider a child learning that the sum of 3 and 4 is 7. The stimu¬ 
lus can be thought of as "3 + 4;" the response as, "7;" and the reinforcer, the 
teacher's approval. Or consider a student learning to read a word. The stimulus 
is the orthographic representation, the response is saying the word, and the 
reinforcement might be some sort of social approval. Similarly, parents' shaping 
of their children's behavior can be conceived of as instrumental conditioning— 
for example, parents rewarding children with money for cleaning their rooms. 

Although these instances of human learning can be considered instrumen¬ 
tal conditioning, they differ in an important way from the situation of Thorndike's 
cats. In the examples given here, humans are told the contingencies that are 
operative, whereas Thorndike's cats had to discover them. Sometimes humans 
do find themselves in instrumental conditioning situations in which they must 
discover the contingency. For instance, many students feel they have to discover 
by trial and error what kind of an essay will earn a high grade from a teacher. 

This chapter focuses on instrumental conditioning in animals. The issues 
involved in instrumental conditioning in humans occupy center stage in the 


Classical and Instrumental Conditioning Compared 

later chapters on memory, skill acquisition, and inductive learning. However, as 
shown later in this chapter, humans placed in the same instrumental condition¬ 
ing paradigms as animals do produce similar behavior. 

The typical instrumental conditioning experiment requires 
organisms to discover that a response in a stimulus situation 
produces a reinforcement. 


Classical and Instrumental 
Conditioning Compared 

Contrasting the procedures used in classical and instrumental conditioning 
helps define them both. In classical conditioning, the experimenter sets up a 
certain contingency such that if a particular stimulus condition occurs, another 
stimulus will occur. For instance, if a dog is in the experimental apparatus and a 
light flashes, the dog will be given food. In instrumental conditioning, the 
experimenter sets up a contingency such that if a particular stimulus condition 
occurs and if the organism emits a response, then a particular reinforcer will 
occur. For instance, if a rat is in a Skinner box and it presses a lever, a pellet will 
appear in the feeder. Thus, the difference is that in instrumental conditioning, 
the reinforcer (which is like a US) is contingent on the conjunction of stimulus 
and response, whereas in classical conditioning it is contingent only on the 
stimulus. Thus, in instrumental conditioning the organism can control whether 
the reinforcer occurs. 

If the organism successfully learns in either situation/ it begins to behave 
as if it had figured out the experimenter's contingency. In the case of classical 
conditioning, it begins to perform a response (the CR) in preparation for the US. 
In the case of instrumental conditioning, it begins to emit the response if it finds 
the reinforcer desirable. The fundamental learning is the same in both cases. The 
organism is learning to form an association between an antecedent configura¬ 
tion of elements (stimuli and, in instrumental conditioning, a response as well) 
and a consequence that can be predicted from these antecedents. Thus, both 
paradigms involve learning environmental contingencies. The difference 
between the two involves the role of the response. In classical conditioning, the 
organism cannot control the resulting US, but its response can prepare for it. In 
instrumental conditioning, the organism's response determines whether the 
resulting reinforcer will occur. 

Much debate has occurred over whether the process of learning is the 
same in both classical conditioning and instrumental conditioning. Classical 
conditioning has often been considered an automatic process, and instrumental 
conditioning a voluntary process. However, as noted in the previous chapter, 
specifying which behaviors are automatic and which are voluntary can be prob 


79 



Chapter 3 Instrumental Conditioning 


lematic. Interest in this distinction has waned, and attention has shifted to the 
behavioral similarities between these two types of conditioning, with the implic¬ 
it assumption that the two kinds of conditioning involve the same learning 
process. Both kinds of conditioning show the same effects of practice, both 
extinguish in the same way when the contingency is eliminated, and both show 
spontaneous recovery. Both kinds of conditioning can be hurt if a delay is placed 
in the contingency. Both paradigms result in successful conditioning only if 
there is a contingency among the elements (not just a contiguity). With respect 
to stimulus control, both show blocking effects, both can show configural learn¬ 
ing, and both show similar generalization and discrimination processes. In addi¬ 
tion, both show effects of associative bias. Since classical and instrumental con¬ 
ditioning are so similar, this chapter essentially uses research on instrumental 
conditioning to expand on the nature of conditioning in general. 

Instrumental and classical conditioning share many similar 

behavioral properties. 

What This Chapter Covers 

The chapter focuses on the same four questions that organized much of the dis¬ 
cussion in the previous chapter. 

What is associated? 

What is the conditioned stimulus? 

What is the conditioned response? 

What is the nature of the association? 

After addressing these questions, this chapter considers the similarity between 
conditioning and causal inference and the evidence about the important role of 
a particular brain structure, the hippocampus, in conditioning. 


What Is Associated? 

Instrumental conditioning involves a stimulus followed by a response followed 
by a reinforcement. For instance, a dog might learn to respond to the stimulus 
"sit" with the response of sitting and receive food as a reward. As in the case of 
classical conditioning, a number of possibilities exist regarding what is associat¬ 
ed to what. One possibility is that the stimulus becomes associated to the 
response. In this case, the reinforcer would stamp in the association but would 
not be part of the association. This was the original idea of Thorndike and some 
of the early learning theorists. However, nearly from the beginning there was 
evidence that organisms also develop specific expectations about the reinforcer. 


80 


What Is Associated? 


For instance, Tinklepaugh (1928) showed that monkeys registered disappoint¬ 
ment when an expected reinforcer (slice of a banana) was replaced by a less val¬ 
ued reinforcer (lettuce). One monkey threw down the lettuce (which it would 
normally eat) and shrieked at the experimenter in anger. This result would seem 
to imply that the reinforcer is part of the association the animal had learned. 

Colwill and Rescorla (1985a, 1985b, 1986, 1988) argued that organisms 
develop associations involving all three terms—the stimulus, the response, and 
the reinforcer. They showed that organisms can learn to expect specific reinforcers 
to specific responses. For instance, rats learned to associate different types of food 
pellets to two different responses (lever pressing and chain pulling). AATien fed 
with one kind of food pellet outside the experiment, they performed a predomi¬ 
nance of responses that yielded the other kind of food pellet. Colwill and Rescorla 
argued that organisms develop expectations; that is, if a certain response is emit¬ 
ted in the presence of a certain stimulus, it produces a certain reinforcer. 

Colwill and Rescorla (1986) used another reinforcer devaluation paradigm 
to make a similar point. They trained rats to make two different responses— 
pushing a rod to the left and to the right. One response was always rewarded 
with food and the other with a sugar solution. Then one of the reinforcers was 
paired with an injection of lithium chloride to produce a taste aversion to that 
reinforcer. The rate of response associated with the devalued reinforcer 
decreased. This result would not occur if the association was just between stim¬ 
ulus (rod) and response (pushing left or right), but it would occur if the associ¬ 
ation also involved the reinforcement. 

One might argue from these studies that what the animal has really learned 
is a two-term association between the response and the reinforcement. The stud¬ 
ies just cited do not show that the animal will make the response only in a partic¬ 
ular stimulus situation. However, in more recent research Colwill and Delamater 
(1995) have shown that animals (rats) will show these reinforcement expectations 
only in the situation where the reinforcements have been contingent on the 
response. For instance, response devaluation suppresses the response only when 
the appropriate condition for that response is met. For instance, suppose the rat 
learned that pulling a handle would produce food pellets in the presence of a tone 
but liquid sucrose in the presence of a flashing light. If satiated on sucrose, this ani¬ 
mal would stop pulKng the lever only in the presence of the flashing light. 

In instrumental conditioning, organisms are learning a three- 
term contingency: that a response in a particular stimulus sit¬ 
uation will be followed by a reinforcement. 


Associations Between Responses and Neutral Outcomes 

The discussion thus far has reviewed evidence that organisms can learn associ¬ 
ations between responses and reinforcing stimuli. What about associations 
between responses and neutral stimuli? Organisms can learn about associations 


81 



Chapter 3 Instrumental Conditioning 


between stimuli and other neutral stimuli in classical conditioning (see the dis¬ 
cussion of sensory preconditioning and second-order conditioning in Chapter 
2). Can they similarly acquire such neutral associations in an instrumental con¬ 
ditioning paradigm? In one experiment by St. Claire-Smith and MacLaren 
(1983), as part of their free exploration of a Skinner box rats learned that a lever 
press produced a noise. The experimental group was then trained on pairings of 
the noise with food without a lever present in the box, and the control group 
was trained on pairings of light and food. When the lever was reintroduced (but 
no food was given), rats in the experimental group pressed the bar more often 
than did the control rats who had not learned the noise-food pairing. As a result 
of their earlier free exploration, they appeared to have learned that lever press¬ 
ing produced the noise. Putting this result together with the classical condition¬ 
ing of noise and food, they acted as if they inferred that lever pressing might also 
produce food. Thus, it appears that organisms are capable of learning associa¬ 
tions between a response and any stimulus that follows just as they are capable 
of learning associations between a CS and any stimulus that follows in a senso¬ 
ry preconditioning experiment. The resulting stimulus need not be reinforcing. 

The ability to form associations between responses and neutral outcomes 
is critical to learning complex chains of responses, only the last of which involves 
reinforcement. Consider a rat learning to run a maze. It must learn a sequence of 
associations of the sort that making a turn in a certain direction in a certain part 
of the maze leads to another part of the maze. There is nothing inherently rein¬ 
forcing about such turn-maze associations. Only the final turn becomes directly 
associated with food (even though it may not be directly associated with food but 
instead with a part of the maze associated with food). However, the rat has to 
learn all these associations in order to put them together to run the maze. The 
latent learning experiments (see discussion under Tolman in Chapter 1) showed 
that rats could learn all these neutral associations before they learned there was 
food in part of the maze. When they learned where food was, they could recruit 
this neutral information to help them get to it. 

Organisms can learn that certain responses produce neutral 

outcomes and combine this information with other experiences 

to obtain reinforcement. 


Secondary Reinforcement 

The previous section described a situation in which the rat first learned the asso¬ 
ciation bar press-noise and then the association noise-food. This situation is sim¬ 
ilar to sensory preconditioning in classical conditioning in that the organism first 
learns a neutral association, then a biologically significant association, and finally 
puts them together. Reversing the order of learning the associations results in the 
equivalent of second-order conditioning; the animal first learns the biologically 
significant association noise-food and then bar press-noise. The noise acquires the 


82 


What Is the Conditioned Stimulus? 


ability to reinforce the bar press for an animal trained with such a procedure, and 
the animal will press the bar just for the click without food (Skinner, 1938). The 
noise is said to be a secondary reinforcer, or a conditioned reinforcer. 

The classic example of a secondary reinforcer is money for humans, which 
can be extremely reinforcing but has no biological function in and of itself; 
human beings have learned to associate money with more primary reinforcers. 
Examples of the many other such secondary reinforcers in human society 
include letter grades in courses and promises of favors. In an experiment 
Saltzman (1949) presented rats with food in a white goal box. Then he intro¬ 
duced them to a T-maze, where the rats had to choose between a path leading 
to a white box and a path leading to a black box. The rats learned to take the 
path that led to the white box even though the box did not contain any food. 
The white box had acquired the ability to reinforce behavior. With enough expo¬ 
sure to the white box, in the maze without food, the rats extinguished and no 
longer chose that path. In like manner, when the currency in a particular coun¬ 
try deflates to the point where it is useless, people cease seeking the money. 

The functions of secondary reinforcers such as money are clear in the 
human world. (It turns out that chimps are also capable of treating coins and 
other tokens as money; see Cowles, 1937; Wolfe, 1936.) Secondary reinforcers 
are promises of primary reinforcement, and people know that they can be 
exchanged for primary reinforcers. It is not clear whether it is always appropri¬ 
ate to attribute such a cognitive explanation to secondary reinforcers in lower 
animals, but it does appear that for many species secondary reinforcers are good 
at bridging delays in reinforcement. For instance, if a 5-sec delay is inserted 
between pecking and reinforcement, a pigeon will not peck a key at a substan¬ 
tial rate. On the other hand, if a green light comes on immediately after the peck 
and the pigeon has seen the light paired with food, the pigeon will learn to peck 
rapidly at the key. The green light becomes a secondary reinforcer that enables 
the pigeon to bridge the delay in reinforcement (Staddon, 1983). 

A secondary reinforcer is a previously neutral stimulus that 
has acquired the ability to reinforce behavior as a consequence 
of being paired with a primary reinforcer. 


What Is the Conditioned Stimulus? 

As reviewed in Chapter 2, some, though not all, variations on the original stim¬ 
ulus are effective in producing the response. The extension of the conditioned 
response to new stimuli is called generalization, and the restriction of the con¬ 
ditioned response from other stimuli is called discrimination. The phenomena of 
stimulus generalization and stimulus discrimination occur in instrumental 
conditioning just as they do in classical conditioning, and they have been stud¬ 
ied much more extensively in the domain of instrumental conditioning. 


83 


Chapter 3 Instrumental Conditioning 

Generalization 

In a prototypical study of stimulus generalization, Guttman and Kalish (1956) 
trained pigeons to peck at a key of a particular color (measured by wavelength). 
During 60-sec intervals, the key was lit with a certain color, and pecking produced 
a reinforcement of food. These intervals were separated by 10-sec intervals of total 
darkness during which the pigeons did not respond. Eollowing the experiment, 
the key was illuminated at different wavelengths, and the number of key pecks 
was recorded to test for generalization. Four conditions were defined by the wave¬ 
length of the original key: 530 nm (green), 550 nm (greenyellow), 580 nm (yellow), 
or 600 nm (yelloworange). After training, the pigeons were tested without rein¬ 
forcement. Figure 3.1 shows, for each training condition, the number of respons¬ 
es for different test wavelengths. Pigeons showed maximal response when the 
test wavelength matched the wavelength on which they were trained. Their rate 
of responding decreased as the difference increased between training and test 
wavelength. These results do not simply reflect the ability to discriminate the 
study stimulus from the test stimulus—^that is, that pigeons responded to a test 
color to the degree that they thought it was the study color; pigeons are capable 
of making much sharper discriminations than those illustrated in Figure 3.1. In 
some sense, pigeons were registering their"opinion"on whether this difference in 
wavelength was likely to be relevant to their reinforcement. 

The curves in Figure 3.1 are often referred to as generalization gradients. 
Many generalization gradients are not as steep as those depicted in Figure 3.1. 
Figure 3.2 from Jenkins and Harrison (1960) illustrates a generalization gradient 
from an experiment in which pigeons were trained to peck when a key was lit 
and a 1000-Hz tone was on and then were tested for tones that varied from 300 
to 3500 Hz. The data are plotted in terms of the percentage of all responses 


FIGURE 3.1 Pigeons are 
trained to peck at lights with 
wavelengths of 530, 550, 580, 
and 600 nm. The curves show 
the total responses to stimuli of 
similar wavelengths. These are 
cumulative responses for 6 min. 
(From Guttman & Kalish, 1956.) 



84 





What Is the Conditioned Stimulus? 


figure 3.2 Rate of respond¬ 
ing to tones of various frequen¬ 
cies for pigeons trained to 
respond to lines of 1000 H. 
(From Jenkins & Harrison, 
1960.) 


30 


C 

o 



O 

05 

CL 

0 -'- 1 -'- 1 -'-^— 

0 1000 2000 3000 

Frequency, Hz 


FIGURE 3.3 Gradients of 
inhibition for three pigeons fol¬ 
lowing learning where only 570 
nm was not reinforced. 



Wavelength, nm 


given to that tone.^The generalization gradient curve is nearly flat, showing lit¬ 
tle decrease in response as the tone varied from the training stimulus of 1000 
Hz. Pigeons were registering their"opinion"that the actual pitch of the tone was 
irrelevant to whether reinforcement would be delivered. The pigeons behaved as 
if the only critical feature was that the key was lit and that it did not matter what 
the tone was. In effect, they ignored the pitch of the tone. 

Figure 3.1 shows a positive generalization gradient, but negative general¬ 
ization gradients are possible, too. Terrace (1972) created a situation in which 
pigeons could receive reinforcement for pecking when the light was homoge¬ 
neous white light and not when the light was a specific color (570 nm). They 
were then tested with lights of specific colors. Figure 3.3 shows their rate of 
responding as a function of wavelength. The minimum rate of responding 

1 The original Jenkins and Harrison data included a no-tone condition, which is not 
shown. 


85 


Chapter 3 Instrumental Conditioning 


occurred at the nonreinforced frequency; the rate gradually recovered as the 
wavelength moved away from this frequency. 

Organisms have biological predispositions to treat certain dimensions as 
significant and certain differences on these dimensions as important in defining 
the CS, while they ignore other dimensions and differences. Organisms may pay 
attention to different aspects of a stimulus in different situations. For instance, 
Foree and LoLordo (1973) trained pigeons with a combined CS of light and 
tone. When the pigeons were reinforced with food, it was the light that con¬ 
trolled their behavior. When they were reinforced with shock, it was the tone. 
This finding may reflect the fact that visual identification is critical to identifying 
food for pigeons but sounds often signal danger. We have already discussed 
such associative biases with respect to classical conditioning, and we will have 
more to say about them later in this chapter. 

Organisms spontaneously generalize the CS, ignoring certain 
dimensions and eertain differences in other dimensions. 


Discrimination 

Although organisms have biological predispositions to attend to certain dimen¬ 
sions and differences and to ignore others, they will change their behavior if 
experience contradicts their biases. For instance, what happens if the organism 
is exposed to multiple stimuli that it initially treats as equivalent, but learns that 
some are accompanied by reinforcement and others are not? The simplest pos¬ 
sibility is an experiment in which the presence of a stimulus is associated with 
reinforcement and its absence is not. Jenkins and Harrison (1960) looked at 
what would happen in such a condition. Recall from Figure 3.2 that, when there 
was only a positive stimulus of 1000 Hz, pigeons pecked at the lighted key no 
matter what the frequency of the tone. Jenkins and Harrison compared this con¬ 
dition with a condition of differential training: when the key was lit and there 
was a 1000-Hz tone, the pigeons were reinforced for pecking the key, but when 
the key was lit and there was no tone, they were not reinforced for pecking the 
key. Figure 3.4 shows the results. There are strong generalization gradients 
around 1000 Hz. The effect of the discrimination training was to indicate that 
the tone was relevant. 

This experiment compared the presence of a tone with the absence of a 
tone, in contrast with many other experiments in which different values of a 
stimulus were positive and negative. In another experiment by Jenkins and 
Harrison (1962), pigeons were first reinforced for pecking in the presence of a 
1000-Hz tone and not in the absence of a tone, as described earlier. Then the 
pigeons were trained to respond to a 1000-Hz tone but not to a 950-Hz tone. 
Figure 3.5 compares the generalization gradients of a pigeon before and after 
learning that the 950-Hz tone was negative.The generalization gradient is much 
steeper after the animal was trained to discriminate between a 1000-Hz tone 


86 


What Is the Conditioned Stimulus? 


FIGURE 3.4 Generalization gradi¬ 
ents following differential training 
with a 1000-Hz tone. Individual gra¬ 
dients are based on the means of 
three generalization tests. 



and a 950-Hz tone. This pigeon actually showed maximum response to a tone 
of 1050-Hz, which is away from the negative 950 Hz. This kind of "overshoot" is 
common in human behavior. If students observe that a 400-word essay got a C 
and a 500-word essay got an A, they might write a 600-word essay. To explain 
this phenomenon, the next section considers a popular theory of discrimination 
learning. 

Organisms can be trained to discriminate among stimulus val¬ 
ues and to respond only to certain ones. 


FIGURE 3.5 Generalization gra¬ 
dients obtained from a pigeon 
trained to respond to a 1000-Hz 
tone and then later trained to dis¬ 
criminate it from a 950-Hz tone. 
(From Jenkins & Harrison, 1962.) 


1000 Hz 



87 



Chapter 3 Instrumental Conditioning 

Spence's Theory of Discrimination Learning 

Spence (1937), a learning theorist strongly influenced by Hull (see Chapter 1), 
developed a theory of how training on positive and negative stimuli combined 
to produce a net generalization gradient. Although more modern versions of his 
theory feature various technical differences that make them more sophisticated 
and accurate (e.g., Blough, 1975), Spence's theory is described here because it 
contains the essential ideas and is the original proposal. Earlier we learned that 
if an animal is reinforced for the response in the presence of a stimulus, it builds 
a positive generalization gradient (Figure 3.1) around the stimulus, and if an 
animal is not reinforced for the response in the presence of a stimulus, it builds 
a negative generalization gradient (Figure 3.3) around that stimulus. Spence's 
basic idea was that behavior in discrimination training is just a combination of 
these positive and negative generalization gradients. Figure 3.6 illustrates his 
analysis. Suppose a circle of 256 cm^ is the positive stimulus and one of 160 cm^ 
is the negative stimulus. Figure 3.6 illustrates the positive generalization gradi¬ 
ent around 256 and the negative generalization gradient around 160. 
Subtracting one from the other produces the net generalization gradient. Note 



Stimulus size 


FIGURE 3.6 Spence's theory of how inhibitory influences from the negative stimu¬ 
lus subtracted from excitatory influences of the positive stimulus yield a net general¬ 
ization gradient. 


88 


What Is the Conditioned Stimulus? 


that the positive peak of this gradient has been shifted from 256 in a direction 
away from the negative stimulus. This is the prediction of a peak shift—the 
stimulus that evokes the most responding is not the positive training stimulus 
but one shifted away from it and the negative stimulus. This prediction is some¬ 
what counterintuitive, since the organism is responding more to a stimulus that 
it has not been trained on than to a stimulus it has been trained on. This pre¬ 
diction is typically confirmed in discrimination experiments of this sort. Figure 
3.5 from Jenkins and Harrison (1962) is one example of this peak shift; the 
pigeon responded more to a 1050-Hz tone than to the 1000-Hz tone with which 
it had been trained. 


Spence proposed that discrimination learning resulted from 
subtracting generalization gradients for nonreinforced stimuli 
from generalization gradients for reinforced stimuli. 

Relational Responding: Transposition 

Spence extended his theory to a simultaneous presentation procedure in which 
the organism must select between two stimuli. Suppose that an organism is 
trained to discriminate between stimuli of 160 cm^ and 256 cm^ given the gen¬ 
eralization gradients illustrated in Figure 3.6 and is then given a choice between 
two stimuli of 256 and 409 cm^. Because of the peak shift, the organism should 
select 409 rather than the original positive 256. A number of experiments sup¬ 
ported this prediction of a preference for the shifted stimulus rather than the 
original. 

This result was explained in another way by Kohler (1955) and other 
Gestalt psychologists. Transposition was the term Kohler used to indicate that 
the organism had transferred the relationship between one pair of stimuli to 
choosing between a different pair. The Gestalt psychologist argued that the 
organism was responding to the relationship between the two training stimuli 
and had learned to select the larger. A long history of controversy has sur¬ 
rounded relational accounts and accounts like that of Spence, which propose 
that the organism responds to the absolute value of the stimulus. This contro¬ 
versy has been settled with the conclusion that both sides are right. Under 
appropriate circumstances an organism can be trained to respond to a relation¬ 
ship between two stimuli, and under other circumstances it can be trained to 
respond to the absolute properties of the two stimuli. 

An experiment by Lawrence and DeRivera (1954) provides an example of 
animals responding relationally. Figure 3.7 illustrates the stimuli used: cards of 
two shades of gray. In Figure 3.7 these shades are indicated by the numbers 1 
through 7: 1 is white, 7 is black, and the other numbers denote the various 
shades between. The bottom half of the card was always 4 and the top half var¬ 
ied. When the top half was lighter (1 to 3), rats were trained to turn right; when 
it was darker (5 to 7) they were trained to turn left. The critical test occurred after 


89 



Chapter 3 Instrumental Conditioning 


Training stimuli for right turn 



1 


2 


3 


4 


4 


4 

Training stimuli for left 


5 


6 


7 


4 


4 


4 


FIGURE 3.7 Stimuli used by Lawrence and DeRivera 
(1954). The numbers 1 through 7 denote shades of gray. 


Test stimuli 
3 5 

1 7 


training. The rats were presented with a card with 3 on top and 1 on the bottom. 
Both 3 and 1 were associated with moving right, but the top was darker than the 
bottom and this relation was associated with turning left. The rats responded to 
the relational information and turned left. In contrast, when they were tested 
with a 5 on the top and 7 on the bottom, they went right, again confirming the 
relational theory. 

The fact that organisms can encode and respond to either relational or 
absolute information raises an extremely troublesome problem in the discussion 
of what constitutes the conditioned stimulus. It is not immediately apparent 
how an organism will encode a particular stimulus. One organism may encode 
it one way (e.g., absolute size) and another a different way (relative size). 
Without knowing how the stimulus is encoded, it is not possible to know what 
patterns of generalization and discrimination will take place. Researchers and 
theorists typically assume what seems to be the obvious encoding. But what 
seems obvious to the experimenter may not seem obvious to the organism. 
Chapter 6 has more to say about how information is represented, particularly in 
the human case. 


The Gestalt psychologists proposed that organisms responded 
to the relationship between stimulus values rather than to the 
absolute values. 


Dimensional or Attentional Learning 

Thus far we have focused on patterns of generalization and discrimination along a 
single dimension. However, most stimuli have many dimensions. For instance, 
visual stimuli have color, size, shape, and position in space. In addition, there are 
various background contextual stimuli, such as the appearance of and possible 
sounds in the laboratory. How is the organism to identify which dimension or 


90 












What Is the Conditioned Stimulus? 


dimensions determine reinforcement? The last chapter described one theory of 
dimensional combination for classical conditioning, the Rescorla-Wagner theory. 
According to that theory, various dimensions or stimuli divided a total associative 
strength according to how reliably they were associated with the US. In effect, they 
competed for association to the US. A similar process seems to occur in instru¬ 
mental conditioning in which stimuli compete for association to the reinforcer. 

Blocking phenomena can also be shown in instrumental conditioning 
(Mackintosh, 1974). Blocking phenomena occur when one stimulus or dimen¬ 
sion becomes so strongly associated that it blocks out other dimensions. The 
blocking data are among the strongest data in support of the Rescorla-Wagner 
theory. On the other hand, in classical conditioning, there is also evidence that 
learning cannot always be simply a matter of responding to individual dimen¬ 
sions because animals can be trained to respond to various combinations of 
dimensions but not to the individual dimensions (Razran, 1971). 

Instrumental conditioning paradigms have been used to explore a some¬ 
what different kind of competition among dimensions. Organisms have limited 
encoding capacity and can only pay attention to so many dimensions at a time. 
With experience, they can change which dimensions they attend to. For 
instance, flat generalization gradients can be transformed into peaked general¬ 
ization gradients by discrimination experiments that simply make the dimen¬ 
sion relevant (contrast Figures 3.2 and 3.4). 

Another kind of evidence for dimensional learning (sometimes called 
attentional learning) comes from experiments that involve learning multiple, 
successive discriminations. The basic paradigm is illustrated in Figure 3.8. A 
value on one dimension is reinforced. (In the training example of Figure 3.8, this 
is red on the dimension color.) After mastering this discrimination, the subject 
is transferred to a condition in which the opposite value on the dimension 
becomes positive (a reversal shift—in Figure 3.8, yellow is now positive) or 
another dimension is used (a nonreversal shift—in Figure 3.8, squares now 
become positive). Reversal shifts might appear more difficult because the organ¬ 
ism must respond in the completely opposite way. On the other hand, nonre¬ 
versal shifts might seem more difficult because the organism has to learn to pay 
attention to a new dimension. Most humans and higher apes find reversal shifts 
easier, whereas very young children and nonprimates find nonreversal shifts 
easier (Mackintosh, 1975). Interestingly, adult humans with damage to their 
prefrontal cortex often have difficulties in reversal conditions as well (Owen, 
Roberts, Hodges, Summers, Polkey, & Robbins, 1993). The frontal cortical areas 
are much expanded in primates and mature in children later than most other 
neural structures. Chapters 6 and 9 will elaborate on the role of the frontal cor¬ 
tex in primate and human learning. 

Figure 3.8 also illustrates intradimensional shift, which requires the sub¬ 
ject to learn to discriminate between new values (such as blue and green) on the 
previously relevant dimension. This situation is contrasted with learning new 
values on the other dimension (extradimensional shift). Intradimensional shifts 
are almost always easier than extradimensional shifts (e.g.. Mackintosh & Little, 


91 


Chapter 3 Instrumental Conditioning 


Training 


Positive R 


Negative Y 


Reversai shift 
Positive 

Negative 


I Y I 
I R I 



Nonreversai shift 


Positive 




Negative 




Intradimensionai shift 
Positive 

Negative 

Extradimensionai shift 

FIGURE 3.8 Schematic representation of stimuli and 
reinforcement contingencies for reversal shifts, nonre¬ 
versal shifts, intradimensionai shifts, and extradimen¬ 
sionai shifts. 


Positive 


Negative 


0 0 
'''®A A 



1969). The contrast between intradimensionai and extradimensionai shifts, 
unlike the contrast between reversal and nonreversal shifts, does not require 
that the organism respond to the same stimuli in different ways. Hence, there 
are no competing responses to the stimuli from the original training. A major 
function of the prefrontal cortex appears to be the inhibition of inappropriate 
responses (Dempster, 1992; Diamond, 1989; Roberts, Hager, & Heron, 1994). 

Thus, it appears that one thing that all organisms learn is what dimensions 
are relevant. Therefore, all organisms find intradimensionai shifts easier than 
extradimensionai shifts because they do not have to learn to attend to new 
dimensions. In the case of organisms with developed and intact prefrontal cor¬ 
tices, they can inhibit competing responses and so also find reversal shifts easi¬ 
er than nonreversal shifts. Lower organisms do not show such dramatic results 
but still find intradimensionai shifts easier than extradimensionai shifts. 

Organisms can learn which stimulus dimensions are relevant 
in discrimination learning. 


92 







What Is the Conditioned Stimulus? 


Configural Cues and Learning of Categories 

In the chapter on classical conditioning, we discussed the evidence that organ¬ 
isms could learn to respond to configurations of dimensions as well as individ¬ 
ual dimensions. Similar demonstrations of configural responding exist in the 
instrumental conditioning domain. In Chapter 2 we described configural cues as 
if they were the exception. However, some theorists (e.g., Pearce, 1994) have 
argued that animals always respond to the total configuration of dimensions 
that a stimulus presents. The different generalization gradients they show on 
different dimensions just reflect the different similarity of these multidimen¬ 
sional stimuli when they are contrasted on single dimensions. 

Strong evidence for multidimensional integration of stimuli indicates that 
organisms respond to categories of objects rather than single dimensions of 
objects. For instance, when we see an object such as a chair we are responding 
to it as a configuration of dimensions that indicate a chair rather than any sin¬ 
gle dimension. Apparently, other organisms also see the world in terms of cat¬ 
egories rather than single dimensions. Figure 3.9 shows some of the stimuli 
shown to pigeons in a discrimination experiment by Herrnstein, Loveland, and 
Cable (1976). Some pigeons were trained to peck at instances of the category 
"tree"; they were trained with some 700 slides of trees and nontrees. The only 
characteristic that the positive pictures had in common (and that discriminated 




Chapter 3 Instrumental Conditioning 


them from the negative pictures) was that they involved a tree. The positive pic¬ 
tures could not be discriminated from the negative pictures on the basis of sim¬ 
ple features. Thus, pigeons could only make this discrimination if they knew 
what a tree was. For humans, this is a relatively easy discrimination because they 
possess the category of trees. It also turned out to be a fairly easy discrimination 
for pigeons. Pigeons not only were able to learn to make such discriminations, 
but they learned in fewer trials than needed in the simple one-dimensional 
problems described earlier. Moreover, after being trained to discriminate one set 
of pictures of trees from a set of pictures of nontrees, the pigeons were capable 
of generalizing this ability to new pictures that had not been used for training. 

Wasserman, Kiedinger, and Bhatt (1988) demonstrated category learning 
by pigeons in a slightly different paradigm. Pigeons were trained to peck at four 
different keys according to the rules: 


Peck key 1 
Peck key 2 
Peck key 3 
Peck key 4 


if the stimulus was one of 10 cat pictures. 

if the stimulus was one of a second set of 10 cat pictures. 

if the stimulus was one of 10 flower pictures. 

if the stimulus was one of a second set of 10 flower pictures. 


Pigeons got quite good at discriminating keys 1 and 2 from 3 and 4, corre¬ 
sponding to the cat-flower distinction. However, they had great difficulty in dis¬ 
tinguishing key 1 from 2 (the cat pictures) or key 3 from key 4 (the flower pic¬ 
tures). They found it difficult to learn discriminations within a category. Humans 
would show similar patterns, finding discriminations between categories easy 
and discriminations within categories hard. 

Chapter 10 provides much more information on concept learning, focus¬ 
ing mainly on human learning of concepts. The experiments just described illus¬ 
trate that lower animals as much as humans see the world in terms of categories 
and specific objects and not in terms of single dimensions like colors and 
shapes. Often this meaningful representation of the world is much more salient 
than the simple dimensional representation, and animals find it easier to learn 
discriminations when the discriminating factor is a salient category. 




Animals easily learn to respond to complex dimensional com¬ 
binations that define significant categories. 






What Is the Conditioned Response? 

The next question to address concerns the nature of the response. The tradi¬ 
tional view was that a specific response was being learned. As early as the 1920s, 
however, researchers began to see problems with that particular point of view. 
Muenzinger (1928) trained guinea pigs to press a bar and found that sometimes 


94 





What Is the Conditioned Response? 


they pressed with one paw, sometimes with another, and sometimes even with 
their teeth! Macfarlane (1930) taught rats to swim through a maze for food and 
then found that they were capable of running the maze for food. Lashley (1924) 
taught monkeys to solve a manipulation problem with one hand and found they 
could generalize the solution to the other hand when the first was paralyzed. It 
seems that organisms come to some representation of the functional structure 
of their environment and select their responses appropriately. Thus, the guinea 
pigs in Muenzinger's experiment were learning not that a particular response 
was associated with reinforcement, but rather that depression of the bar was 
associated with reinforcement. As in classical conditioning, the response is the 
organism's adaptation to what it has learned about the environment. 

Skinner (1938) recognized the nature of the response in his definition of 
an operant. Different responses that had identical effects in the environment 
(had identical reinforcement consequences) were defined to be instances of the 
same operant. Organisms can be trained to discriminate among responses that 
appear to have equivalent effects on the environment (e.g., use of the left hand 
versus the right hand to press a bar) if the experimenter sets up reinforcement 
contingencies that differentiate them. However, they behave as if their default 
assumption were that actions with equivalent effects on the world produce 
equivalent rewards—certainly a plausible default assumption. 

Organisms tend not to discriminate among responses that are 

equivalent in their effect on the environment. 


Maze Learning 

Maze learning by rats provides some of the strongest evidence that the organism's 
response is an adaptation to what it has learned about its environment. Rats are 
animals whose natural environments are much like mazes, and they are skillful at 
learning complex mazes, challenging humans in their ability. As noted in the dis¬ 
cussion ofTolman in Chapter 1, rats'ability to navigate in mazes depends in part 
on their developing cognitive maps. They learn the locations of food and other 
objects in space and traverse the maze to get to those locations. However, there is 
also evidence that rats can learn the specific turns involved in navigating a maze. 

More recent research has revealed some of the other ways in which rats 
cope with mazes. Research (e.g., Olton, 1978) has been conducted with a radial 
maze such as that shown in Figure 3.10; the rat is put in the center of the maze, 
and food is placed at the end of each of the eight arms. Rats on their first 
encounter with this maze tended to perform very well, visiting about seven of the 
eight arms in their first eight choices. The rats displayed an amazing ability to 
avoid revisiting these arms.^ How were they able to explore these mazes so effi¬ 
ciently? One might think that the rats had some systematic plan such as going 

^ Only in the second edition did I notice the pun in this sentence. 


95 



Chapter 3 Instrumental Conditioning 


FIGURE 3.10 A top view of a radial maze. 
Source: From D. S. Olton and R. J. Samuelson. 
Remembrance of places passed. Spatial memory 
in rats. Journal of Experimental Psychology. Animal 
Behavior Processes, Volume 2. Copyright © 1976 by 
the American Psychological Association. 
Reprinted by permission. 



through all the arms in a left-to-right order. This does not seem to be the answer, 
however, because they did not display any specific order in visiting the arms. 
Rather, the evidence is that rats have good memories for locations and avoid 
repeating visits. This is an adaptive trait in their natural environment, where they 
need to keep track of where they have been and consumed food. If they have 
depleted the food in a particular location, there is no point in repeating the trip. 

Other research on rats has compared their ability to learn shift versus stay 
strategies in aT-maze (Haig, Rawlins, Olton, Mead, & Taylor, 1983). AT-maze (see 
Figure 3.11) is a simple maze in which a rat runs from a start box to a choice point, 
at which it must go in one of two directions. There are goal boxes to the left and 
to the right, and one of them contains food. Shift and stay strategies refer to two 
different principles experimenters have used to determine which goal box to place 
food in. The strategies differ in terms of where to look for food after the first trial. 
If the rat is being trained with a stay strategy, it finds food if it goes down the path 
that had food before. In a shift strategy, it finds food if it goes down the other path. 


FIGURE 3.11 An example of a T-maze. 



96 



What Is the Conditioned Response? 


Rats find it much easier to learn the shift strategy. This result is just the opposite 
of what would be predicted if rats were learning specific responses, but it is exact¬ 
ly what would be predicted from their foraging habits in the wild, where they need 
to avoid food locations that they have already depleted. Several other species also 
show this tendency to learn better with a shift strategy (e-gv Kamib 1978). 
Interestingly, animals find shift strategies harder to learn when they are not 
allowed to deplete the food in the goal box (Haig, Rawlins, Olton, Mead, & Taylor, 
1983); then they have a reason to return to the same part of the maze. 

Rats navigate in various environments according to a cogni¬ 
tive map, which includes where significant objects like food 
are to be found. 


Response Shaping and Instinctive Drift 

In our discussion of Skinner in Chapter 1, we introduced the idea of response 
shaping. This is a way to train animals to produce specific behaviors they are 
unlikely to emit naturally. The basic idea is that the animal always emits some 
range of behaviors and shaping involves selectively reinforcing ever closer 
approximations to the target response. For instance, Skinner succeeded in train¬ 
ing pigeons to play a simplified Ping-Pong game. Initially, he reinforced them 
whenever they faced the Ping-Pong ball; later he withheld reinforcement until 
they approached the ball; and still later he reinforced them when the beak made 
contact with the ball. Eventually, the pigeon was only being reinforced for actu¬ 
ally hitting the ball. Parents use similar procedures to reinforce children's behav¬ 
ior such as their social skills. Parents begin reinforcing simple "hello's", "please", 
and "thank-you" and eventually (if their training skills are good) they have off¬ 
spring who are graceful members of society. 

As parents will report, however, even the most careful shaping schedules 
sometimes backfire and the learner slips into undesired behavior. At least with 
lower organisms, part of the problem is that the organism's instincts about 
appropriate responses can get in the way of such response shaping. Chapter 1 
mentioned that a pig was trained to go through an elaborate set of procedures 
mimicking the morning routine of a human. Pigs are normally easy to train, but 
the trainers described a problem with what they termed instinctive drift 
(Breland & Breland, 1961). They wanted to train a pig to take a large wooden 
coin and place it in a piggy bank. The pig was able to learn this behavior quite 
well given food reinforcement, but after a few weeks, instead of putting the coin 
in the bank, it would repeatedly drop the coin, root it (dig or turn it up with its 
snout), and toss it up in the air. The pig became useless as a performer, and the 
Brelands had to train another pig, which soon developed the same problem^ 
This behavior is part of the natural food-gathering behavior of pigs. They had 
come to regard the coins as a food and consequently began to behave toward 
the coins as they did toward food. 


97 


Chapter 3 Instrumental Conditioning 


The Brelands reported a variation of this problem when they tried to train 
raccoons to place coins in a container. The raccoons began engaging in behaviors 
that corresponded to washing and cleaning the food. Although the intrusive 
behavior was different from that of the pigs, the behavior was part of the species- 
specific food-gathering behavior of raccoons. Thus, organisms'instinctive response 
patterns can overwhelm responses carefully shaped by instrumental conditioning. 

Attempts to shape behaviors in organisms may be frustrated 
by species-specific response patterns. 


Autoshaping 

The previous subsection described how conditioning efforts can be frustrated by 
organisms' biological predispositions toward appropriate response patterns. A 
somewhat different result can also happen; experimenters can train a behavior 
without trying. A much-studied example, called autoshaping, was discovered 
by Brown and Jenkins in 1968 in their work with pigeons. At irregular intervals 
they illuminated a respoiase key and then followed the key with food from a 
grain dispenser. Although the birds did not have to peck at the key to obtain the 
food, they all started to peck at the key. They wound up behaving as if there had 
been a contingency between pecking and food. 

Considerable effort has been made to understand why the pigeons would 
peck at a key when it was unnecessary. One enlightening experiment was per¬ 
formed by Jenkins and Moore (1973). They deprived the pigeons of either water 
or food, and then they used the autoshaping procedure of the illuminated key 
followed by the appropriate reinforcer. All the pigeons began pecking at the key, 
but the way in which they pecked at it differed depending on the reward. When 
a pigeon had been deprived of food and the reward was food, the pigeon pecked 
with an open beak and made other movements similar to those that pigeons 
make when they are eating. When the reward was water for a pigeon deprived 
of water, the bird pecked at the response key with a closed or nearly closed beak; 
again, this and other features of the pecking movement were like the move¬ 
ments that pigeons make when drinking. 

These results can be interpreted as examples of classical conditioning. That 
is, the lit key is a CS that predicts the US of food or water, and the animal is giv¬ 
ing a conditioned response of pecking to that CS. Although this interpretation 
may be basically correct, it fails to capture the full complexity of autoshaping 
behavior. A good example of this complexity was observed by Timberlake and 
Grant (1975) in a study of autoshaping in rats. Two groups of rats received differ¬ 
ent CSs presented in advance of the delivery of a food pellet. For one group the 
CS was a block of wood; the rats came to gnaw at the wood. For the second group 
the CS was another rat; in this case the rats approached the other rat and engaged 
in various social behaviors, such as sniffing and grooming. Thus, depending on the 
CS, rather different behaviors were autoshaped. The difference makes sense if the 


98 


Association: Contiguity or Contingency? 




Pigeons pecking for water (top row) and for food (bottom row). 


eating behaviors of rats are considered. Rats usually eat in groups and display 
social behaviors to other rats while eating; they also ^paw at inanimate objects as 
part of their eating behavior. There is a complex species-specific pattern of eating 
behavior, and different aspects of it are selected by different stimuli. 

Generally, the lesson of the research on autoshaping and instinctive drift 
is that animals come to learning situations with strong patterns of instinctive 
behavior. These patterns may cause the organisms not to learn what the exper¬ 
imenter intended but rather something else. With respect to the issue of what 
the conditioned response is, this research shows that the organism is not just a 
bundle of simple muscle movements waiting to be conditioned to a stimulus. 
Rather, the responses are parts of existing behavioral systems, and their condi¬ 
tioning cannot be understood unless these systems are understood. Timberlake 
(1983, 1984) uses the term behavior systems analysis to refer to this approach 
that emphasizes the natural, unlearned organization of behavior for a species. 

Autoshaping occurs when a stimulus evokes some species-spe¬ 
cific behavior because of its association with a reinforcer. 


Association: Contiguity or Contingency? 

One issue involved in the case of classical conditioning is whether the learning 
is produced because the CS and the US are contiguous or because they are con¬ 
tingent. The corresponding issue in the case of instrumental conditioning is 
whether learning is produced because the response and reinforcer are contigu¬ 
ous or because they are contingent. Again, contiguity is the requirement that the 


99 



Chapter 3 Instrumental Conditioning 


two occur in close temporal proximity; contingency is the further constraint of a 
predictive relationship between the two. For example, drinking a glass of water 
and feeling healthy may be contiguous, but this does not mean that drinking 
water produces a feeling of being healthy, because the person may usually feel 
healthy. For there to be a contingency, the probability of feeling healthy would 
have to be greater after drinking a glass of water than otherwise. 

Experiments have varied the probability that the reinforcer would be deliv¬ 
ered when the response was made versus when the response was not made—on 
the analogue of the Rescorla experiment discussed in Chapter 2. For instance, 
Hammond (1980) trained rats to press a bar for reinforcement in an experiment 
that involved four phases. Figure 3.12 illustrates the results in each phase. 

Phase 1. If the rats pressed the bar in any 1-sec interval, they had a 
chance of a reinforcer. Hammond shaped the rats to a point where they 
were receiving reinforcers after only 5 percent of these response-filled, 1- 
sec intervals. The rats were making about 3000 bar presses an hour. 

Phase 2. Hammond began giving reinforcements 5 percent of the time 
when 1 sec passed and no response had been made. He still gave a reward 
5 percent of the time when a response was made, but the reward was no 
longer contingent on response—it was equally likely whether or not a 
response had been made. The rats'rate of responding dropped off rapidly 
until they were making virtually no responses. Thus, even though the 
same degi'ee of contiguity of response and reward was maintained, rats 
stopped responding because there was no longer a contingency. 


Phase 1 Phase 2 Phase 3 Phase 4 



FIGURE 3.12 Responses per hour for rats when there is a contingency between 
pressing and reinforcement and when there is not. Source: From L. J. Hammond. 
Journal of the Experimental Analysis of Belmvior, The effect of contingency upon the 
appetitive conditioning of free-operant behavior, 34, 297-304. Copyright © 1980 by 
the Society for the Experimental Analysis of Behavior, Inc. Reprinted by permission. 


100 


Association: Contiguity or Contingency? 


Phase 3. Hammond stopped giving reinforcers when the rats did not 
respond, and the response rate of the rats picked up. 

Phase 4. Hammond removed the contingency again, and the response 
rate went down again. 

These animals were shown to be sensitive to the experimenter's contingencies, 
just as animals were shown to be sensitive to CS-US contingency in classical 
conditioning. 

Organisms display conditioning when there is a contingency 
between response and reinforcement. 


Superstitious Learning 

Some of Skinner's famous experiments (Skinner, 1948) on what has been called 
superstitious learning were thought to be evidence that contiguity was suffi¬ 
cient for learning and contingency was not necessary. Food was made available 
to pigeons from a feeder at fixed intervals (e.g., 15 sec for some, longer for ofh- 
ers) regardless of what they were doing. Although there was no contingency 
between behavior and reinforcement, pigeons in this situation developed high¬ 
ly routinized behaviors. One pigeon turned counterclockwise; another thrust its 
head into the upper corners of its cage. Skinner reasoned that these systematic 
behaviors appeared because of accidental contiguities between what the pigeon 
was doing and the delivery of food. For instance, when the food was delivered, 
the pigeon might be hopping from one foot to the other. The contiguity between 
this response and the food would increase the pigeon's tendency to hop from 
one foot to the other and would thus increase the chance that the pigeon would 
be engaged in this behavior the next time the food was delivered, increasing the 
tendency for the behavior even more, and so on, until the pigeon would always 
be hopping from foot to foot. Thus, even though there was only accidental con¬ 
tiguity between behavior and reinforcement and there was no contingency, con¬ 
ditioning would occur. In effect, the pigeons developed the superstition that 
their behavior was necessary for the reinforcement. Skinner speculated that this 
might be the cause of superstitious behavior in humans, such as rain dances to 
produce rain; sometimes rain dances are indeed followed by rain, but, presum¬ 
ably, they do not produce the rain. 

Subsequent research and analysis have raised doubts about Skinner's 
interpretation of these experiments. Staddon and Simmelhag (1971) repeated 
the superstition experiment and replicated many of Skinner's results. However, 
they demonstrated that the situation was more complicated than Skinner real¬ 
ized. They noted that the pigeons'behavior could be divided into two categories. 
Immediately after receiving a reinforcement, pigeons displayed interim behav¬ 
iors. There was a wide variety of such behaviors, including the sort Skinner 
reported. After a while, pigeons began to engage in terminal behaviors, clearly 


101 


Chapter 3 Instrumental Conditioning 

in anticipation of the next feeding. This terminal phase always involved some 
variety of pecking. 

Staddon and Simmelhag's results present serious difficulties for any 
attempt to explain superstitious behavior as learning by contiguity. First, there is 
no reason for two segments; second, there is no reason for all pigeons to peck 
in the terminal segment, which is contiguous with the reinforcement. Staddon 
and Simmelhag argued that terminal behaviors should be understood as exam¬ 
ples of autoshaping, which, as we have discussed, is perhaps best thought of as 
a classical conditioning phenomenon. 

Although each pigeon evolved systematic interim behaviors, these behav¬ 
iors were not contiguous with reinforcement and thus whatever caused them 
was not learned by contiguity. Therefore, what was contiguous was not instru- 
mentally conditioned but was classically conditioned, and what might be instru- 
mentally learned was not contiguous. Staddon (1983) suggested that these 
interim behaviors often served other functions, such as grooming or exercise. 

According to this view, human behavior is often analogous to that of rats 
in these experiments. Many of us eat on rather fixed schedules. When food is not 
likely, we often engage in predictable interim behavior (e.g., studying or watch¬ 
ing television). When food is likely, we engage in predictable terminal behavior 
in anticipation of the food (e.g., going to the kitchen and setting the table). 

Given food at fixed intervals, organisms will first engage in 
interim behaviors when food is not likely and then in terminal 
behaviors when the time for food approaches. 

Partial Reinforcement 

The experiment by Hammond (Figure 3.12) used a partial reinforcement sched¬ 
ule; that is, only some of the responses were rewarded. It is sometimes hard to 
discern that the partial reinforcement rate for a response is greater than the back¬ 
ground rate of reward. Suppose that the probability of getting a reward in 1 sec is 
5 percent if an animal presses a bar, but 4 percent if the animal does not press the 
bar. The animal might fail to detect the contingency and not display conditioning. 

When organisms are being maintained on partial reinforcement schedules, 
especially schedules with low rates of reinforcement, they also have a problem 
discriminating when extinction begins. It is easy to discriminate 0 percent rein¬ 
forcement in extinction from 100 percent during conditioning, harder to discrim¬ 
inate 0 percent from 25 percent, and much more difficult to discriminate 0 per¬ 
cent from 1 percent. Organisms are found to take longer to extinguish after train¬ 
ing on a partial reinforcement schedule, and their resistance to extinction 
increases as the reinforcement rate is lowered. This phenomenon is called the 
partial reinforcement extinction effect. It is a bit paradoxical because it implies 
that the less reinforcement received in the past, the slower the organism is to give 
up on an activity. This effect has interesting implications for molding the behav- 


102 






Association: Contiguity or Contingency? 


ior of people. For instance, if parents want their children to be persistent in pur¬ 
suing a goal in the face of adversity, it suggests that they should only occasional¬ 
ly reinforce their children's goal-seeking activities. Eisenberger, Heerdt, Hamdi, 
Zimet, and Bruckmeir (1979) demonstrated that children completed more work 
in handwriting and mathematics if they had been partially reinforced in the past. 

Partial reinforcement increases resistance to extinction because the condi¬ 
tions under which the animal learns are similar to the conditions of extinction. 
Basically, the animal learns to respond to the features that occur during extinc¬ 
tion. Several researchers have proposed what these features might be. Capaldi 
(1967) suggested that during learning organisms come to associate sequences of 
nonreinforced responses with eventual reinforcement. Thus, in extinction, when 
the organism encounters a sequence of nonreinforced trials, it expects rein¬ 
forcement. Amsel (1967) proposed that during initial training, the organism 
becomes frustrated when it does not receive reinforcement and has associated 
its frustration with reinforcement. Thus, when frustrated in extinction, it also 
expects reinforcement. Both theories have in common the idea that the partial¬ 
ly reinforced organism learns to associate reinforcement to the kinds of features 
encountered in extinction. 

Conditioning is more difficult in partial reinforcement sched¬ 
ules, but such schedules result in greater resistance to extinction. 


Learned Helplessness 

Perhaps the most dramatic evidence that organisms can be aware of the contin¬ 
gency (or lack thereof) between their behavior and reinforcement is found in the 
experiments on learned helplessness. In a prototypical experiment by 
Seligman and Maier (1967), dogs were given painful shocks at unpredictable 
intervals. A control group of dogs could avoid the shocks by pushing a panel, 
whereas the experimental group could do nothing to escape the shock. Thus, 
one group of dogs learned a behavior that would eliminate shock, whereas the 
otherdidnot. 

Both groups were then placed in the same escape avoidance condition: 
they could avoid the shock if they jumped over a barrier after hearing a tone. 
Dogs in the control group, which could control their shock in the first phase, 
readily learned to jump over the barrier. In contrast, the experimental dogs 
whined and yelped but made no attempt to escape the shock. After many trials, 
the animals simply lay down and hardly moved at all. They had learned that 
nothing they could do would prevent shock—that there was no contingency 
between their behavior and receiving shock. 

Maier, Jackson, andTomie (1987) argued that learned helplessness is pro¬ 
duced because the organism pays less attention to its oven behavior. Past behav¬ 
ior has been a poor predictor of whether it will receive shock, and so the organ¬ 
ism continues to assume its behavior will have no effect in a situation where it 


103 


Chapter 3 Instrumental Conditioning 


could learn to escape shock. This situation is like latent inhibition in classical con¬ 
ditioning, where an organism comes to ignore a certain CS (see Chapter 2), or 
like dimensional learning in instrumental conditioning, where the organism 
comes to ignore a dimension (see earlier in this chapter). Consistent with this 
interpretation, animals will show similar failure of learning in the case of positive 
reinforcement (e.g.. Job, 1989). If a response has previously not been associated 
with food, they will continue to ignore it when it acquires such a contingency. 

Similar effects occur in many situations with many species, including 
humans. Some argue that this may be what is behind such phenomena as math 
phobia. After a long series of failures, people come to believe that nothing they 
can do will help them learn math and so they stop trying. In one experiment, 
Hiroto and Seligman (1975) showed that humans subjected to a long series of 
unsolvable anagram problems failed to learn other easy-to-learn experimental 
tasks. Seligman (1975) also suggested that clinical depression may be a variety 
of learned helplessness. When people suffer a number of uncontrollable nega¬ 
tive life events, they may withdraw, thinking that they have no control over all 
aspects of their lives. 

To deal with these clinical problems, Seligman has suggested a number of 
measures, based on analogy to research with dogs. If a helpless dog is forced to 
cross the barrier enough times with success, it will eventually cross on its own. 
By analogy, depressed patients might be helped by exposure to success experi¬ 
ences. Dogs can also be immunized by initial exposure to situations where they 
can escape from shock; they are then less likely to learn helplessness when later 
exposed to inescapable shock. By analogy, early successes in mathematics for 
children, earned by their hard work and efforts, may inoculate them against later 
math difficulties, developing in them the tendency to persist in the face of diffi¬ 
culties or failures. However, given what we know about partial reinforcement 
(previous subsection), a schedule of "partial success" would probably be more 
effective than a schedule of"success only"in promoting persistence in the pres¬ 
ence of temporary future (Dweck, 1975; Kennelly, Dietz, & Benson, 1985). 

Organisms that have repeatedly received unavoidable aversive 

stimuli come to ignore the relationship between their behavior 

and environmental outcomes. 

Associative Bias 

Although organisms may be capable of learning many response-reinforcer 
associations, they are biologically predisposed to learn certain associations, just 
as they are predisposed to learn certain stimulus-stimulus associations in clas¬ 
sical conditioning (e.g., taste-poisoning discussed in Chapter 2). A pigeon can 
more readily learn to peck to receive food than to avoid shock (Hineline & 
Rachlin, 1969; MacPhail, 1968; Schwartz, 1973), but it can quite readily learn to 
flap its wings to escape shock (Bedford & Anger, 1968). These outcomes make 


104 



Association: Contiguity or Contingency? 

sense because pecking is part of the pigeon's eating repertoire and wing flap¬ 
ping is part of its repertoire of escape behaviors. 

Shettleworth (1975) did an interesting analysis of the effects of reinforce¬ 
ment on various behaviors of hamsters. She noted that they tended to engage 
in certain behaviors when hungry such as standing on their hind legs (which 
she called open rear), scraping at walls (scrabbling), and digging in the ground. 
Other activities, such as washing their faces, scratching, and marking (pressing 
a scent gland), did not increase when they were hungry. Different hamsters 
were reinforced by food for each of these six behaviors. Figure 3.13 shows the 
results. Subjects learned to increase the eating behaviors but not the noneating 
behaviors in response to food reinforcement. Thus, organisms show associative 
biases in instrumental conditioning as in classical conditioning; in instrumental 
conditioning, they are biased to certain response-reinforcer pairings. 

Bolles (1970) argued that associative bias was a particularly important fac¬ 
tor in the case of escape behavior. He argued that each species has species-spe¬ 
cific defense reactions, which determine the difficulty of learning an escape 


FIGURE 3.13 Mean time 
spent performing the rein¬ 
forced response per 1200-sec 
session. Source: From S. J. 
Shettleworth. Reinforcement 
and the organization of 
behavior in golden hamsters. 
Hunger, environment, and 
food reinforcement. Journal of 
Experimental Psychology. Ani¬ 
mal Behavior Processes, Volume 
104. Copyright © 1975 by the 
American Psychological Asso¬ 
ciation. Reprinted by permis¬ 
sion. 


500 


400 


S 300 


200 


100 


Scrabble 



105 


Chapter 3 Instrumental Conditioning 


behavior. For instance, rats find it easy to learn to flee to avoid a shock but hard 
to learn to press a bar to escape shock. The relative ease of these two responses 
is reversed if the reinforcer is food. 

Humans face difficulties in skill learning when the skills entail learning 
responses that are antagonistic to human predispositions. For example, in 
downhill skiing the skier leans forward to control speed and should lean for¬ 
ward more the steeper the hill. Most beginners have difficulty because of their 
natural tendency to lean backwards. As another example, when a car is skidding 
on an icy road the driver needs to turn into the skid and not slam on the 
brakes—drivers have great difficulty learning the appropriate response and 
inhibiting the incorrect response. 

Organisms are biologically prepared to leant certain 
response-outcome combinations. 


Instrumental Conditioning 
and Causal Inference 

We have focused on instrumental conditioning experiments from the animal's 
perspective. But human subjects can be placed in similar situations. Imagine what 
it would be like if you were put in a room to explore and discovered that some¬ 
times when you flipped a switch on the wall, money came forth. If you thought 
you would be able to keep any money you found, you might find yourself flipping 
that switch as fast as a rat pushes a lever or a pigeon pecks a key. Your perfor¬ 
mance could be plotted in cumulative response records, and we could speak of 
you as learning an association between the switch and money. To speak of it as an 
association, though accurate, would probably not fully express your mental state. 
You probably also would have formed the belief that flipping the switch caused 
the money to come forth. It is unclear to which other organisms such causal 
beliefs may be ascribed, but it is appropriate to ascribe them to humans. 

Wasserman (1990) studied the development of humans' causal beliefs in 
instrumental conditioning paradigms and found that these causal beliefs devel¬ 
op much as associations do in lower organisms. Subjects were given a key, 
which they were encouraged to press. Sometimes when the subject pressed the 
key a light went on, and sometimes when the subject did not press the key the 
light went on. The light was like a reinforcer (or in this case more like a neutral 
stimulus) that followed the response. Wasserman varied the probability that the 
key press would be followed by the light. He broke the experiment into 1-sec 
intervals. If a subject pressed the key in the interval, the interval would end with 
a light flash with different probabilities in different experimental conditions. He 
used probabilities of 0.00, 0.25, 0.50, 0.75, and 1.00. These probabilities were 
referred to as P{0\R) for probability of outcome given response. Wasserman also 


106 



Instrumental Conditioning and Causal Inference 


FIGURE 3.14 Causal inference as 
a function of the probability of a 
light given a key press and the 
probability given no key press. 
(Data from Wasserman et al., 1993.) 



manipulated the probabilities that a 1-sec interval without a key press would 
result in a light. These probabilities were referred to as P(0|-1^), for probability 
of outcome given no response, and they similarly took on the same values of 
0.00, 0.25, 0.50, 0.75, and 1.00. Wasserman looked at all combinations of P{0\R) 
and P{0\-R) for 5 x 5 = 25 conditions. 

Wasserman asked subjects to rate the causal relationship between the 
press and light on a scale that varied from -100 (prevents light) to +100 (causes 
light). Figure 3.14 illustrates the results. As in the animal conditioning experi¬ 
ments, subjects' ratings of causal strength was a function of the difference 
between P(0|K) and P(Ol-R). The particular level of causal strength for a value 
of P(0|R) depended on the value of P(0|-P). This is the same sort of relationship 
Rescorla illustrated in his experiment on classical conditioning (see Figure 2.9). 
Chapter 10 examines human causal inference further, but the research described 
here indicates that causal inference may be closely related to conditioning. 

Human judgments of causality are affected by the same con¬ 
tingency variables that influence animal conditioning. 


Application of the Rescorla-Wagner Theory 

Wasserman, Elek, Chatlosh, and Baker (1993) showed that the behavior of their 
human subjects could be predicted by the Rescorla-Wagner theory. First, let's 
consider how the theory would apply to instrumental conditioning in general. 
Recall that in classical conditioning this theory assumes that the strength of asso¬ 
ciation between the CS and the US changes according to the following equation: 

A U = a{\ - V) 


107 



Chapter 3 Instrumental Conditioning 


where a is the learning rate; X is the maximum strength of association; and V is 
the sum of the existing associative streirgths from the CSs presented on that 
trial. This theory can be mapped onto instrumental conditioning by letting the 
experimental context and the response be two cues (i.e., the CSs) that are asso¬ 
ciated to the reinforcement (i.e., the US).Then X represents the strength of asso¬ 
ciation that can be conditioned to the outcome or reinforcement. Wlien the out¬ 
come occurs after a response, there are two cues for conditioning: the response 
and the stimuli of the experimental context. If the outcome occurs without the 
response, then only the contextual stimuli are present. This is a competitive 
learning situation in which the response and the context are competing for 
association to the reinforcement. This way of applying the Rescorla-Wagner 
theory to instrumental conditioning predicts many features of instrumental con¬ 
ditioning, just as it predicts the features of classical conditioning. 

One outcome that the Rescorla-Wagner theory predicts is the subject's 
sensitivity to the difference in reinforcer rates in the presence versus the absence 
of the response. This sensitivity is seen in Hammond's experiment on bar press¬ 
ing with rats (Figure 3.12) and in Wasserman's human analogue (Figure 3.14). 
Chapman and Robbins (1990) showed mathematically that, according to the 
Rescorla-Wagner theory, the competition between context and response results 
in a strength of association to the response that is proportional to the difference 
in reinforcement rates. Thus, the process by which people form causal inferences 
corresponds closely to the predictions of an associative learning theory. 

The Rescorla-Wagner theory can predict behavior in an 

instrumental conditioning paradigm by assuming competitive 

learning between context and response. 

Interpretations 

Two rather different conclusions are possible from this research on causal infer¬ 
ence and the Rescorla-Wagner theory. One conclusion is that the simple associa¬ 
tive learning processes of the Rescorla-Wagner theory are responsible for humair 
causal inference. As noted, Chapman and Robbins showed that the theory results 
in strengths of association between response and outcome that are exactly equal 
to the difference P{0\R) - P(0|-R). The theory in no way explicitly estimates prob¬ 
abilities P(0|R) and P(0|-R), let alone takes their differences. Nonetheless, it esti¬ 
mates this quantity, supporting the point made in the previous chapter that sim¬ 
ple associative learniirg judgments can mimic sophisticated statistical inference. 

A dramatically opposite conclusion can also be drawn. Subjects in these 
experiments were not in conditioning experiments; that is, they were not in sit¬ 
uations in which experimental contingencies reinforced their responses. Rather, 
they were asked to make judgments of causal relatedness between response and 
outcome. The fact that their causal inferences were like conditioning suggests 
that causal inference, and not simple associative learning processes, underlies 


108 




The Hippocampus and Conditioning 


conditioning. That is, what organisms are learning in an instrumental condition- 
ing experiment might be a causal model of the environment, and they act con¬ 
sciously according to it. As discussed in Chapter 10, this view is the appropriate 
interpretation of the human situation, and it may be the appropriate interpreta¬ 
tion of the conditioning behavior of higher nonhuman organisms as well. 
Holyoak, Koh, and Nisbett (1989) showed that many conditioning phenomena 
in classical and instrumental conditioning can be explained by assuming that 
organisms learn causal rules to predict the structure of their environment. 

A wide range of conditioning phenomena can be explained by assuming 
either simple associative learning or conscious cognitive judgment. To reiterate 
a theme of this book, this is not an either-or situation. Some instances of con¬ 
ditioning in some organisms may be due to unconscious, associative processes, 
and other instances of conditioning in other organisms may be due to develop¬ 
ment of causal models. There may be subtle differences between conditioning 
behavior produced by simple associative learning versus conscious inference, 
but by and large they look similar behaviorally because both reflect learning 
adaptations of the organism to the structure of its environment. We will return 
to this issue in greater detail when we discuss human causal inference in 
Chapter 10. 

Conditioning phenomena can he explained by assuming either 
acquisition of simple associations or development of causal 
models. 


The Hippocampus and Conditioning 

The hippocampal formation is a relatively small structure surrounded by the 
temporal cortex. The hippocampus has been strongly implicated in learning and 
memory in many organisms. In Chapters 7 and 8 we will discuss its important 
role in human memory and how damage to it can result in profound memory 
deficits. Here we will simply discuss the research that has been done involving 
rats. Figure 3.15 compares the hippocampal formation of a rat, a monkey, and a 
human. Note their differences in size and anatomy. It is not obvious that it 
serves the same function in all species, but it is the hope of the field that it serves 
similar functions. If this is true, then research done on the rat will shed light on 
the nature of human memory and its deficits. 

Figure 3.16 presents an outline of the relevant components of the rat hip¬ 
pocampus and a schematic of their relevant connections. There are connections 
from the cortex itself to what is called the parahippocampal region (which is 
adjacent to the hippocampus) and connections from this to the hippocampus 
itself. Many experiments have been performed studying the impact of lesions 
(removal) of the hippocampus on learning in rats. These lesion studies initially 
involved removal of both the parahippocampal areas and the hippocampus. 


109 



(b) monkey (c) human 

FIGURE 3.15 Comparative hippocampal gross anatomy: rat (a), monkey (b), and 
human (c). (From Rosene & Van Hoesen, 1987.) 


More recent research, using more refined techniques, has tried to separate out 
the contribution of the parahippocampal areas to learning from the contribution 
of the hippocampus (and we will describe these studies momentarily). Rats with 
lesions to the parahippocampal area and the hippocampus perform poorly in a 
wide range of instrumental and classical conditioning paradigms. They show a 
particular deficit in tasks involving a substantial spatial component, such as 
maze learning. An example that illustrates the deficit involves the Morris water- 
escape task (Morris, 1981). Rats are placed in a circular pool of water and must 
swim to an escape platform. If they climb onto the escape platform, the experi¬ 
menter removes them from the pool; otherwise they are left to swim around. 
The water is murky, and so the rats are unable to see anything below the sur¬ 
face. In some conditions, the escape platform is above the water's surface and 
the rats can see it; in other conditions, it is just below the surface and they can¬ 
not see it. In the original experiment Morris contrasted these four conditions: 





The Hippocampus and Conditioning 




FIGURE 3.16 {A) Simple schematic diagram of cortical-hippocampal cormection. 

(B) Outline of a horizontal rat brain section illustrating the locations and flow of 
information between components of the hippocampus, parahippocampal region, and 
adjacent cortical areas. DG, dentate gyrus; EC, enthorhinal cortex; FF, fimbria-fornix; 
Hipp, hippocampus proper; OF, orbitofrontal cortex; Pir, piriform cortex; PR, perirhi¬ 
nal cortex; Sub, subiculum. Source: From H. Eichenbaum, Declarative Memory: Insights 
from Cognitive Neurobiology, Vol. 48, p. 559. Reprinted with permission. 


1. Cue + place. The escape platform is always visible and always in the same 
location. 

2. Place. The platform is submerged but always in the same location. 

3. Cue only. The escape platform is always visible but in different locations 
on different trials. 

4. Place random. The platform is submerged and in different locations. 

Rats learned to swim quickly to the escape platform in all conditions but the 
last. Figure 3.17 shows the tracks taken by a rat in each group on the last four 
trials. Only the rats in the last group wandered much in the pool. This task is sig¬ 
nificant because it shows that rats are excellent in using a spatial representation 
to navigate through their environment. As Figure 3.17 illustrates, although rats 
in the place condition started from a different part of the pool on each trial, they 
knew where the submerged platform was and swam to it. 

This experimental paradigm has become important for understanding the 
role of the hippocampus. Rats with hippocampal lesions perform poorly in the 
place condition—no better than normal rats perform in the place-random con¬ 
dition. In contrast, normal and lesioned rats behave similarly when the platform 
is visible (Morris, Garrud, Rawlins, & O'Keefe, 1982). Results such as this have 
been used to argue that the hippocampus is significant in spatial learning. Some 
other kinds of learning are not impaired by hippocampal lesions. For instance, 
lesioned rats can still learn taste aversions and how to make simple visual dis¬ 
criminations. 


Ill 





Chapter 3 Instrumental Conditioning 



Trial 

18 19 




FIGURE 3.17 A vertical view of the tracks taken by rats in each group. Source: From 
R. G. M. Morris. Learning and Motivation, Volume 12. Copyright © 1981 by Academic 
Press. Reprinted by permission. 


Rats with hippocampal lesions perform poorly in many tasks 
that require spatial learning. 




The Nature of Hippocampal Learning 

The field has been struggling to characterize the kinds of learning impaired by 
hippocampal damage. O'Keefe and Nadel (1978) proposed that the hippocam¬ 
pus, at least in the rat, is especially designed for learning spatial information. In 
effect, it encodesTolman's spatial map (see Chapter 1). They reported that many 
neurons in the hippocampus fire only when the animal is in a certain location 
in space. 

Olton, Becker, and Handelmann (1979) argued for a different interpreta¬ 
tion of hippocampal deficits. Noting that many deficits occur in nonspatial tasks 
and that some spatial tasks fail to show a deficit, they argued that the deficit is 
a more general inability to hold information in working memory (a concept dis¬ 
cussed at length in Chapter 5) over short periods of time. An example of the dis¬ 
tinction to which they refer can be illustrated with respect to the radial maze 


112 



The Hippocampus and Conditioning 


(Figure 3.10). Olton et al. reported a study that used a 17-arm version of this 
maze in which 8 of the arms were baited with food and the other 9 were not. 
With experience with this maze, normal rats learned two things: 

1. Never to enter the 9 arms that were never baited with food. 

2. To efficiently explore the baited arms to avoid repeat visits, as discussed 
■with respect to Figure 3.10. 

Rats ■with hippocampal lesions learned 1 but not 2. Both sorts of information are 
spatial, but lesioned rats can learn one and not the other. In Olton's terms what 
they cannot do is rapidly update their working memory to avoid repeated ■visits 
(2). Given enough experience, however, they can learn permanent properties of 
their spatial en^vironment (1). 

Sutherland and Rudy (1991), taking a more traditional conditioning per¬ 
spective, argued that the deficit is in the ability to form configural associations. 
(See the discussion in Chapter 2 of the distinction between associations to stim¬ 
ulus configurations versus stimulus elements.) They argued that to solve the 
Morris water-escape task when the platform was submerged, the animal had to 
respond to a configuration of spatial cues, whereas when the platform was vis¬ 
ible the animal could simply respond to the "visible platform. Eichenbaum, 
Stewart, and Morris (1990) ran a variation of the submerged condition in which 
rats always started from the same location. In this case, hippocampal lesioned 
rats learned the task. In this condition, they did not have to respond to the con¬ 
figural cues but could just swim in the same direction. 

Sutherland and Rudy (1991) performed the following experiment, which 
showed that rats with hippocampal lesions had difficulty learning a nonspatial 
task that involved forming configural associations. Animals were rewarded with 
food for pressing a bar when a light alone or a tone alone appeared. However, 
they were not reinforced for responding when the light and tone were present¬ 
ed simultaneously. As discussed in Chapter 2, normal animals can perform this 
task, which requires learning associations to stimulus configurations of light -i- 
no tone and tone -i- no light. Rats with hippocampal lesions are unable to learn 
these associations, although they can learn to respond to the simple single stim¬ 
uli. Thus, this is a nonspatial task in which lesioned rats show a deficit. 

In recent years, the field has been developing elaborations of this config¬ 
ural cue proposal. The basic idea is that an organism without a hippocampus can 
only respond to single stimulus dimensions, but "with a hippocampus, the 
organism can respond to stimulus combinations. It has been shown that hip¬ 
pocampal cells will fire selectively to various combinations of cues (for instance, 
odors in rats—Otto & Eichenbaum, 1992) just as O'Keefe and Nadel found cells 
that responded to combinations of various spatial cues. 

Both Eichenbaum and Bunsey (1995) and Gluck and Myers (1995) have 
made a similar distinction between two ways in which the hippocampus can 
join different elements into a whole. In one case, the elements are fused into a 
single whole in which the identity of the elements are lost. Eichenbaum and 
Bunsey (1995) suggest the analogy of combining two words like hell and o into 


113 


Chapter 3 Instrumental Conditioning 


the word "hello." The other way is to join the elements into an association in 
which the individual element identity is preserved. The analogy suggested by 
Eichenbaum and Bunsey is joining the same words in English, army and table, 
into a paired-associate army-table. There is evidence that the parahippocampal 
region performs stimulus fusion, while the hippocampal region performs the 
element combination (see Figure 3.16). 

Gluck and Myers use this distinction to explain latent inhibition and the 
effect of hippocampal lesions on latent inhibition. Recall from Chapter 2 that 
latent inhibition refers to the phenomenon that if a CS is presented a number 
of times before a US, it becomes harder to condition to a US. According to Gluck 
and Myers, this is because the CS becomes fused with the context and is hard 
to separate from the context. Lesions that include both the parahippocampal 
region and the hippocampus abolish latent inhibition and make it easier for the 
lesioned animals to condition in a latent inhibition paradigm. However, latent 
inhibition is maintained if just the hippocampus is affected. According to Gluck 
and Myers, this is because the parahippocampal region performs the fusion that 
is responsible for latent inhibition. Recall that latent inhibition was one of the 
problems with the Rescorla-Wagner theory. Gluck (1997) has argued that the 
Rescorla-Wagner learning rule describes cortical learning but that different 
learning rules are required to characterize hippocampal learning. 

The apparent role of the hippocampus is to bind stimulus ele¬ 
ments into combinations. 

Long-Term Potentiation (LTP) 

Another reason for the interest in the hippocampus is that it is one region of the 
brain where a particular type of neural learning has been displayed. When brief, 
high-frequency electrical stimulation is administered to some neural areas of 
the hippocampus, there is a long-term increase in the magnitude of the 
response of the cells to further stimulation (e.g.. Bliss & Lomo, 1973). This 
change, called long-term potentiation (LTP), occurs immediately and lasts for 
weeks. LTP involves increasing the synaptic connections among neurons. For 
LTP to take place, the presynaptic and postsynaptic neurons must be simulta¬ 
neously active. Because it is a permanent change and depends on joint activa¬ 
tion of two neurons, it is thought to be involved in at least some kinds of asso¬ 
ciative learning. Although LTP in the hippocampus has been studied most, it 
occurs in many other regions of the brain as well. 

A great deal of research has been done on the physical basis for LTP in the 
hippocampus (for a review, see Bliss & Lynch, 1988, or Swanson, Teyler, & 
Thompson, 1982). The LTP procedure results in structural changes in the den¬ 
drites onto which axons synapse. The dendrites grow new spines at points 
where the axons synapse, and existing receptors on the dendrites become 
rounder. The change in the shape of the receptors appears temporary, but the 


114 



The Hippocampus and Conditioning 

increase in the number of spines is more long lasting. In addition to these post- 
svnaptic changes there are presynaptic changes involving an increase m the 
release of neurotransmitters. Recall that the neural basis of learning m Aplysia 
also involved an increase in the presynaptic release of neurotransmitters. 

Considerable work has been done on the biochemistry of these chariges 
in the spines. Certain receptors in the postsynaptic membrane on the dendrite 
(NMDA receptors) are normally blocked and become unblocked only if the 
postsynaptic cell has fired. If the presynaptic cell fires and sends a neurotrans¬ 
mitter to the postsynaptic membrane at the same time that the postsynaptic cell 
is firing then these unblocked receptors can receive the neurotransmitter. It is 
thought that the unblocking of these NMDA receptors is the critical step in he 
prodLtion of LTP. This unblocking in turn enables calcium to enter into e 
postsynaptic neuron, resulting in an increase in both the receptors in *6 post¬ 
synaptic neuron and the presynaptic release of the neurotransmitter. Kandel 
and Hawkins (1992) speculated that the postsynaptic influx of calcium may 
cause chemical messengers to be transmitted to the presynaptic axon, resulting 
in the increased release of neurotransmitter. 

Simultaneous activity of presynaptic and postsynaptic cells in 
the hippocampus can produce long-term facilitation of the 
synaptic connection. 

Long-Term Potentiation and Hippocampal Learning 

Much of the interest in LTP has arisen because LTP has been well documented 
in the hippocampus and hippocampal damage is known to produce learning 
deficits in a wide range of tasks. Thus, it has been conjectured that, when intact 
animals learn these tasks, LTP is the neural process that underlies th^ir learn¬ 
ing. An effort has been made to bolster this connection by showing that phar- 
mLological interventions that interfere with LTP produce learning deficits sim¬ 
ilar to those produced by hippocampal lesions. . , 

Morris, Anderson, Lynch, and Baudry (1986) examined the effects of 
blocking LTP by injecting a drug that prevents activation of NMDA receptors 
involved in LTP. They looked at the performance of injected rats in the Morris 
water tasks and found significant impairment, similar to that of lesioned ra . 
The same injected rats were not impaired in tasks such as visual discriminatio , 
which are also not impaired by hippocampal lesions. Similar drug-induced 
learning deficits that mimic those of rats with hippocampal lesions have been 
reported by Staubli, Thibault, DiLorenzo, and Lynch (1989) and Robinson, 

Crooks, Stinkman, and Gallagher (1989). 

Some doubt has been expressed about whether LTP is really involved in 
the kind of learning observed in tasks such as the Morris water-escape tasL 
Keith and Rudy (1990) noted that drug-injected rats, though impaired showe 
more learning than hippocampal-lesioned rats. Thus, while LTP may play some 


115 




Chapter 3 Instrumental Conditioning 


role in learning, it does not seem to be all that there is to hippocampal involve¬ 
ment. More recent research (Bannerman, Good, Butcher, Ramsay, & Morris, 
1995; Saucier & Cain, 1995) has found totally normal learning of the water maze 
in drug-injected rats that have had some general training in this kind of task, 
although not the specific water maze. Thus, it seems that the LTP component 
may only block general learning of how to do the task and not the actual spa¬ 
tial properties of the water maze. The spatial structure of the maze appears to be 
learned by some other hippocampal process not involving LTE 

LTP is only part of the neural changes that underlie learning 

in hippocampal-dependent tasks. 


Final Reflections on Conditioning 

In both instrumental and classical conditioning experiments, animals and 
humans are capable of learning about their environments and responding adap¬ 
tively. In classical conditioning they learn that one stimulus predicts another, 
and they respond in anticipation of that fact. In instrumental conditioning they 
learn that a stimulus signals that a certain class of responses will lead to some 
outcome, and they respond according to whether or not that outcome is rein¬ 
forcing. This research fits the adaptive function of learning identified in the first 
chapter. 

In keeping with the language of the field, this chapter has referred to 
organisms forming associations among stimuli and responses. However, the 
meaning of the term association does not capture all that is going on. The organ¬ 
isms are not just connecting these stimuli and responses; rather, they are learn¬ 
ing that certain elements predict other elements. In the case of instrumental 
conditioning, they are learning about the causal structure of their environ¬ 
ment—for instance, that a bar press causes food to be delivered. This learning 
need not involve an explicit causal model. This chapter showed how the 
Rescorla-Wagner theory is capable of implicitly encoding this causal structure in 
simple associations. We will return to the issue of causal inference in Chapter 10, 
where we will learn about other mechanisms for inferring causal structure in 
humans. 

In other paradigms, such as maze learning, organisms are learning some¬ 
thing more specific than just what predicts what. They are learning about the 
spatial layout of their environment and what objects are located where. This 
cognitive map can be used flexibly to achieve goals. The nature of spatial mem¬ 
ory in the human case is investigated further in Chapter 6. 

A general characterization of conditioning is that it involves learning use¬ 
ful information that allows the organism to respond adaptively to the reinforce¬ 
ment contingencies of the experiment. The next chapter focuses on the role of 
reinforcement in conditioning. 


116 




Further Reading 


In a conditioning experiment, organisms are learning things 
about their environment and using this information to achieve 
their needs. 


Further Readings 

The textbooks and journals cited at the end of Chapter 2 are also excellent 
sources for research on instrumental conditioning. In addition, many research 
articles on instrumental conditioning are found in Learning and Motivation and 
Journal of the Experimental Analysis of Behavior. Balsam (1988) reviews the 
research relevant to stimulus generalizations and discriminations. Staddon and 
Ettinger's (1989) text on learning emphasizes its adaptive function. Gluck and 
Meyers (1997) and Eichenbaum (1997) are two reviews of research relevant to 
hippocampal function, and Landfield and Deadwyler (1988) edited a series of 
articles on LTP. The November 1996 edition of the journal Hippocampus was 
devoted to theories of the role of the hippocampus in learning. 


117 


